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ABSTRACT 



A method and s ystem for recovering from a failure of a 
processing node in a partition ed s^red^n^^ng^H^tahas^. 
processing system are provided. T he processing system may 
include a pair of processing nodes having twin-tailed- 
connected thereto a storage device. A first processing node 
of the pair of processing nodes has a first database instance 
running thereon which accesses a first data partition on the 
storage device prior to the failure. Upon detection of the 
failure, access to the first data partition on the storage device 
is provided to a third, spare processing node through the 
second processing node of the pair of processing nodes. The 
third processing node runs a replacement database instance 
for the first database instance which was running on the first 
processing node prior to the failure thereof. The replacement 
database instance accesses the first data partition on the 
storage device through the second processing node, thereby 
recovering from the failure of the first processing node. 
Access to the first data partition may include using a virtual 
shared disk utility having a server portion on the second 
processing node and a client portion on the third processing 
node. 

57 Claims, 4 Drawing Sheets 
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METHOD AND SYSTEM FOR RECOVERY IN 
A PARTITIONED SHARED NOTHING 
DATABASE SYSTEM USING VIRTUAL 
SHARE DISKS 

RELATED APPLICATION INFORMATION 

This application relates to the following commonly 
assigned U.S. Patent Applications: "METHOD AND SYS- 
TEM FOR DATABASE LOAD BALANCING", Ser. No. 
08/332,323, now U.S. Pat. No. 5,625,811 filed on Oct. 31, 
1994 and "APPLICATION-TRANSPARENT RECOVERY 
FOR VIRTUAL SHARED DISKS", Sen No. 08/332,157, 
filed on Oct. 31, 1994. 

Each of these Applications is hereby incorporated by 
reference herein in its entirety. 

TECHNICAL FIELD 

This invention relates to computer database processing 
systems. More particularly, this invention relates to recovery 
from a processing node failure in a shared nothing database 
processing system. 

BACKGROUND OF THE INVENTION 

Modern computer systems often involve multiple, indi- 
vidual processors or nodes which are interconnected via a 
communication network. Large amounts of information are 
often stored and processed in such systems. In addition to 
processing equipment, each node typically has digital stor- 
age devices (e.g., magnetic disks) for storing the informa- 
tion. The information is often arranged as a database that 
occupies the available storage space at the various nodes in 
the system. 

The techniques employed for arranging the required stor- 
age of, and access to a database in a computer system with 
multiple nodes are dependent on the requirements for the 
specific system. However, certain requirements are common 
to most systems. All data in the database should be available 
for access from any node in the system. The amount of 
storage overhead and processing overhead must be kept at a 
minimum to allow the system to operate efficiently, and the 
storage/access strategy must generally be immune to failure 
occurring at any one node. 

Two general techniques for database storage, or 
partitioning, are employed in modem systems. The first, data 
sharing, involves providing physical access to all disks from 
each node in the system. However, to maintain coherency of 
the database, global locking or change lists are necessary to 
ensure that no two nodes inconsistently change a portion of 
the database. 

Th e second technique o f data storage involves physically 
p artitioning the data and frsjs &ujmg me resultant partition s 
to responsible or owner nodes in th e system w hich beco me 
r esponsible for transact ions involving their own, corre - 
sponding partitions! ~ ~~ 

Tfiis "shared nothing" architecture requires additional 
communication overhead to offer access to all of the data to 
all nodes. A requesting node must issue database requests to 
the owner node. The owner node then either: (i) performs the 
requested database request related to its corresponding par- 
tition (i.e., function shipping) or (ii) transfers the data itself 
to the requesting node (i.e., I/O shipping). 

A problem with the shared nothing approach is tbe 
potential for failure at any one node and the resultant 
inability of that node to accept or process database requests 
relating to its partition. 
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Two principal methods are currently known for recovery 
of a node failure in a shared nothing database system: (i) 
asynchronous replication, where updates to the data are sent 
to a replica asynchronously (see e.g., "An Efficient Scheme 

5 for Providing High Availability/ 1 A. Bhide, A. Goyal, H. 
Hsiao and A Jhingran, SIGMOD '92, pgs. 236-245, incor- 
porated herein by reference); and (ii) recovery on a buddy 
node to which disks of the failed node are twin-tailed- 
connected. Twin-tailing disk units to buddy processing 

10 nodes is known in the art, and involves a physical connec- 
tion between a single disk and more than one processing 
node. In one mode of twin-tailing, only one node is active 
and accesses the disk at any one time. In another mode of 
twin-tailing, both nodes are allowed to access the disk 

15 simultaneously, and conflict prevention/resolution protocols 
are provided to prevent data corruption. 

The primary advantage of method (i) is that it can recover 
from either disk or node failures, however the primary 
disadvantages of this method are that data is mirrored, 

20 consuming twice the disk capacity, and the overhead 
involved during normal failure-free operation for propagat- 
ing data to the replica. The primary advantage of method (ii) 
is that there is no overhead during normal operations, 
however the primary disadvantage is that after a failure, 

25 twice the load is imposed on the buddy node and this can 
lead to half the throughput for the entire cluster, because 
query scans or transaction function calls to the buddy node 
of the failed node become the bottleneck for the entire 
cluster, 

30 What is require d, t herefore, is a t echnique for recover y 

from a^pr ocessin^ gode failure in a sha red_nothing databas e 

p rocessinfH^ysTemT wfticti does not lncliT^igmricant process- 

ing ^overhead during normal operation, or storage space 

overhead for full data replication. 
35 y 

SUMMARY OF THE INVENTION 

A processing node failure recovery technique is provid ed 
b y~~the ins tant invention, wmcn m one aspect relates to a 

^ m etnoa ana system ror recovering from a failure of a firs t 
p rocessing noae in a datapase pro cessing s ystem h a ving a 
p l u rality ot processm^ node s. A first database instance is run 
on ike nrst processing node prior to its failure. The first 
processing node and the second processing node have com- 

45 monly connected thereto a first storage device for storing 
first data for the first database instance. After detecting a 
failure of t he fi rst processing node, a ccess to the first data is 
provided to a third p rocessing n ode through the^se cond 
processing node.\ l he hrst database instance is then run on 

5Q the'third processing node which accesses the first data on the 
first storage device through the second processing node. 
Recovery from the failure of the first processing node is 
therefore provided. 
In a modified embodiment, the first data is copied from 

S5 the first storage device to a second storage device connected 
to the third processing node. While running the first database 
instance on the third processing node, subsequent updates to 
the first database instance may be mirrored to the first 
storage device and the copied data on the second storage 

60 device. Following a restart of the first processing node, the 
first processing node may be designated as a spare process- 
ing node in the system for subsequent node failures. 

The first storage device may comprise two storage devices 
each twin-tailed-connected to the first and second processing 

6 5 nodes. 

Access to the first data through the second processing 
node includes using a virtual shared disk utility having a 
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server portion on the second processing node and a client 20^, . . . ,20„ each normally running a respective database 
proportion on the third processing node. instance DB 1 , . . . ,DBN n . A suitable network (not shown) 
Because the second processing node may also have run- provides communication between the nodes. Disks 30* and 
ning thereon a second database instance of its own, access 30 Jt+1 are twin-tailed-connected to buddy processing nodes 
to second data for the second database instance may be 5 20* and 20^ using connections 25. Shown here is an 
provided to a fourth processing node through the second exemplary twin-tailed implementation, but in fact the disks 
processing node. In that case, the second database instance in general can be multi-tailed connected to multiple pro- 
can be thereafter run on the fourth processing node by cessing nodes. FIG. 1 therefore shows node 20 k running a 
accessing the second data on the first storage device through database instance DB* and its buddy node 20 k ^ running 
the second processing node. The second processing node 1Q database instance DB Jt+1 . 

therefore would only be required to support a server portion Under normal operation, the disks are logically parti - 

thereon, and the database instance processing is completely tioned among the buddy nodes, so that one node logically 

offloaded from the second processing node to the third and owns one su bset of the twin-tailed disks and the buddy node 

fourth processing nodes, which have running thereon me remainder. There may be on the order of tens, possibly 

respective client portions of the virtual shared disk utility. ^ up to hundreds* of database processing nodes. In addition, a 

Additional embodiments and modifications to these tech- set of spare processing nodes 40j, 40 2 are configured in the 

niques are disclosed herein, including recovering the first system be at kast one nodc> prcfcrab i y 

database instance on the second processing node, during ^ and iW morc Qodes ^ ^ 

which an attempt is made to restart the first processing node. ^ , , . r iL . 1 

If the attempting results in a successful restart, the first f The techmc J uc ^ the present invention fo, J W g 

database Stance is restarted on the first processing node. If 20 from a pmce^no node fi, ^. , s dieted m Fir, 7 

the attempting does not result in a successful restart, the fi S ure illustrates the case when the node 20 k+1 that was 

database instance is started on the second processing node, running database instance DB fr+1 fails. The prior art method 

or a spare processing node, as discussed above. (") (discussed above) would recover database instance 

T he instant invention therefore provides an effectiv e DB *+i on bud6 V node 20 *> so that node 20 k would run both 

recovery te ch nique in a soared nothin g database proces sing 25 of the database instances DB* and DB* +1 after a failure. As 

system, whieffdo'es not incur signincant processing over- discussed above, this could lead to twice the load on node 

head during normal operation, or storage space overhead for 20^ with a resultant loss of performance for the entire 

full data replication. system. FIG. 2 illustrates the disclosed technique to solve 

BRIEF DESCRIPTION OF THE DRAWINGS ^ P roblem: Mie * Mure database instance DB* +1 is run 

30 on a separate, spare node 40^ This database instance still 

The subject matter which is regarded as the invention is aeeds to access the same disks that were i ogicaU y assigned 

particularly pointed out and distinctly claimed in the con- to it prior t0 mc faihirc M iUustrated m FIG 2 , after a 

eluding portion of the specification. The invention, however, M the ^ that wefe m the { ^ ^ of ^ 

both as tc .organization and method of practice together with Mcd Qode 2Q are reconfi d ^ access th h the 

further objects and advantages thereof, may best be under- , , 1 . • » L cfl L i 

stood by reference to the following detailed description of 35 buddy node 20* via communication path 50 over a suitable 

the preferred embodiment(s) and the accompanying draw- communication network (not shown) This access is 

ings in which- provided, for example, by a Recoverable Virtual Shared 

FIG. 1 depicts a database processing system having a ? T ^ n (RVSI ?) ^ disclosed in the above-incorporated 

plurality of processing nodes, two spare processing nodes, U S ' Paten ! Application entitled, Application-Transparent 

and storage devices connected to at least some of the 40 Recovery for Virtual Shared Disks". On failure of a node, 

processing nodes; RVSD transparently switches over to provide access to disks 

FIG. 2 depicts a first embodiment of the present invention 30 * and 30 *+i via the budd y node 20 *> from ^ node in thc 

wherein, following a node failure, a database instance is run system. Following RVSD recovery, the database instance 

on one of the spare processing nodes and accesses data DB *+i tnat nad been running on the failed node is restarted 

through a virtual shared disk utility on the processing node 45 on one of the backup nodes 40 v Instance DB k+1 logically 

to which is connected a storage device having data for the owns the same disks and accesses the database partition of 

database instance thereon; the failed instance by making disk read/write requests 

FIG. 3 is a flow diagram of the recovery steps involved through an RVSD client portion on node 40 5 to a server 

following the failure of one of the nodes; portion on node 20^ The RVSD utility transparently ships 

FIG. 4 is a modified embodiment of the present invention 50 the requests to node 20* and retrieves the proper data, 

wherein two database instances are run on two respective Using the technique depicted in FIG. 2, node 20* has a 

spare processing nodes, each accessing a virtual shared disk l° a d corresponding to the database load of instance DB* and 

server on another processing node to which is connected a the VSD server load supporting instance DB^ on node 40 3 . 

storage device having data for the two database instances; This load would be less than that of recovering a full 

FIG. 5 is yet another modified embodiment of the present 55 instance of DB 2 on node 20* after a failure. Thus, with this 

invention in which a copy is made of the data to storage option, the throughput after a failure will be somewhat 

devices on the formerly spare processing nodes to support degraded because of the dual responsibilities of node 20*, 

recovery from future node failures; but ^is throughput is considerably more than in the prior art 

FIG. 6 is a flow diagram depicting yet another modified approaches discussed above, 

embodiment of the present invention wherein an attempt to 60 FIG 3 ^ a flow diagram of the steps necessary to 

reboot the failed node is accompanied by a simultaneous implement the recovery technique of FIG. 2. Following a 

recovery of the database instance. failure of node 20* +1 in Step 100, a spare node 40 2 is 

selected to run the instance DB fr . . in Step 110. Assuming 

DETAILED DESCRIPTION OF THE that ^ 3 „ holds , he relevant to instance 

PREFERRED EMBODIMENTS) 6J DB^.,, node 20, performs a VSD takeover of disk 30^ in 

With reference to FIG. 1, a database processing system 10 Step 120. A proper client portion of VSD is configured on 

is shown having a set of database processing nodes node 40^ in Step 130. In Step 140, all other nodes are then 
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informed (via update of the appropriate tables on the system) 
that instance DB^ will now be running on node 40 1( 
Therefore, all relevant requests for database instance DB A+1 
will be directed to node 40 r Finally, in Step 150, instance 
DB^i is started on node 40 lt s 

A modified embodiment of the present invention is 
depicted in FIG, 4. FIG. 4 shows database instance DBk also 
being restarted on another spare node 40 2 , with remote VSD 
access to its data via path 60 through node 20*. By running 
both database instances DB^ and DB^ on spare nodes 40 2 10 
and 40 1? the load on node 20^ is only that required to handle 
the VSD accesses from both of these instances. Measure- 
ments indicate that, with this configuration, the VSD load on 
node 20^ is likely to be less than that for normal operation. 
Further, sequential access throughput over VSD is very close 15 
to the sequential access throughput of a local disk, and 
random access throughput can also be sustained by VSD. 
Therefore , this configuration will lead to performance after 
failure of node 20 fr+1 very close to normal performance. 
However, one trade-off is that moving database instance 2 o 
DB k may entail bringing down that instance and restarting it 
on the spare node. The impact of this depends on the 
workload. For decision support, failure of node 20^ +3 will 
likely impact most, if not all, of the running queries; thus, 
bringing down and restarting DB^ as well will likely be 25 
acceptable. For OLTP, this choice will depend on the frac- 
tion of the workload impacted by node 20 k failure versus the 
impact if both nodes 20^ and 20^ +1 are brought down on 
failure. 

One potential problem with this technique is handling 30 
reintegration after node 20^ comes back up. In the simplest 
case, node 20^ +1 may have failed due to an operating system 
crash, and a mere reboot may bring it back up. Ideally, one 
would like to restore the system to a configuration that can 
handle a subsequent failure (i.e., a mode which has enough 35 
spare nodes designated). One alternative is to move database 
instances DB fc and DB^j back to nodes 20 fr and 20 Jt+1 
respectively. However, this typically requires bringing down 
the database instances and then restarting them at the 
original nodes. An extension to the technique disclosed 40 
herein that handles reintegration without bringing down the 
database instances involves copying the data from disks 30^ 
and 30* +1 to twin-tailed (65) disks 70 a and 70 2 on the spare 
nodes. This can be done concurrently with database ope ra- 
tion after failure reconfiguration . By access through VSD the 45 
dMa can oe mirrored to twin-tailed disks on_th_e_spare noc ks, 
and any concurrent updates to the disks mu st be mirrored at 
b oth node 20^ and the formerly spare nodes 4Uj and 40 2 . 
Those skilled in the art will readily appreciate triat mis can 
be accomplished by appropriate synchronization . Nodes 20^ 50 
a nd 20^ 1 can thereafter be d esignated'as spare nodes inj he 
system for recovering from tuture tauures ot other nodes. 

As mentioned aDove, a mere reooot may suffice to restore 
the failed node 20 fc+1 to working condition. In such a case, 
it may be desirable to avoid takeover altogether. However, 55 
this requires deferring the takeover decision until after the 
failed node has been restarted and has attempted to reboot, 
which increases recovery time accordingly. The following 
technique depicted in the flow diagram of FIG. 6 can be used 
to overlap recovery actions with the attempted reboot of the 60 
failed node. When node 20^ fails (Step 200), its buddy node 
20* takes over its disks and initiates recovery, i.e., performs 
file system recovery and log-based database instance recov- 
ery (Step 210). During this recovery period, the failed node 
20^ j can attempt to reboot (Step 220). If it succeeds 65 
(Decision 230, "Y"), it resumes control of its original disks 
and restarts a database instance locally (Step 250). If it fails 



to reboot (Decision 230, "N"), the database instance is 
started on the buddy node 20^ or on a spare node, as 
discussed above (Step 240). In all cases, restart of the 
database instance is immediate, since recovery of the disks 
has already been performed by the buddy node 20^. 

The techniques of the instant invention are applicable to 
database processing systems, and in particular to any parti- 
tioned (shared nothing) database systems. 

The present invention can be included in an article, of 
m anufacture (e.g. , one or more computer program products ) 
h aving, tor instance, computer useable medi a. The med ia 
h as embodied therein, for instance, con iputerreadabte pro- 
gram code rnea n^ fpr providing and Tacflu^mng tne mecET- 
nisms of the present invention. The article of manufa cture 
can be included as part of a computer system or so ld 
se parately. 

While the invention has been particularly shown and 
described with reference to preferred embodiment(s) 
thereof, it will be understood by those skilled in the art that 
various changes in form and details may be made therein 
without departing from the spirit and scope of the invention. 

What is claimed is: 

1. A method for recovering from a failure of a first 
processing node in a database processing system having a 
plurality of processing nodes, comprising: 

running a first database instance on the first processing 
node, the first processing node and a second processing 
node having commonly connected thereto at least one 
first storage device for storing first data for the first 
database instance; 

detecting a failure of the first processing node; 

providing, to a third processing node, access to the first 
data on the at least one storage device through the 
second processing Dode; and 

running the first database instance on the third processing 
node, including accessing the first data on the at least 
one first storage device through the second processing 
node, thereby recovering from the failure of the first 
processing node. 

2. The method of claim 1, further comprising: 
copying the first data from the at least one first storage 

device to at least one second storage device connected 
to the third processing node; 
wherein said running the first database instance on the 
third processing node includes mirroring subsequent 
updates to the first database instance to the first data on 
the at least one first storage device and the copied first 
data on the at least one second storage device. 

3. The method of claim 2, further comprising, after a 
restart of the first processing node: 

designating the first processing node as a first spare 
processing node in the database processing system. 

4. The method of claim 1, wherein the at least one first 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 

5. The method of claim 1, wherein the at least one first 
storage device comprises multiple storage devices each 
multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system. 

6. The method of claim 1, wherein the first data comprises 
a partition of a partitioned shared nothing database resident 
on the database processing system. 

7. The method of claim 1, wherein said providing, to the 
third processing node, access to the first data includes using 
a Virtual Shared Disk utility having a server portion on the 



11/20/2002, EAST Version: 1.03.0002 



t 



5,907,849 



8 



second processing node and a client portion on the third 
processing node. 

8. The method of claim 1, wherein the third processing 
node is a designated spare processing node in the database 
processing system. 5 

9. The method of claim 1, further comprising: 

prior to said detecting the failure of the first processing 
node, running a second database instance on the second 
processing node, the at least one first storage device 
having stored therein second data for the second data- 1Q 
base instance; and 
after said detecting the failure of the first processing node: 
providing, to a fourth processing node, access to the 
second data on the at least one first storage device 
through the second processing node, and 
running the second database instance on the fourth 
processing node including accessing the second data 
on the at least one first storage device through the 
second processing node. 

10. The method of claim 9, further comprising: 

copying the first data and the second data from the at least 20 
one first storage device to at least one second storage 
device commonly connected to the third and fourth 
processing nodes; 

wherein said running the first database instance on the 
third processing node includes mirroring subsequent 25 
updates to the first database instance to the first data on 
the at least one first storage device and to the copied 
first data on the at least one second storage device; and 

wherein said running the second database instance on the 
fourth processing node includes mirroring subsequent 30 
updates to the second database instance to the second 
data on the at least one first storage device and to the 
copied second data on the at least one second storage 
device. 

11. The method of claim 10, further comprising, after a 35 
restart of the first processing node: 

designating the first processing node as a first spare 
processing node in the database processing system; and 

designating the second processing node as a second spare 
processing node in the database processing system. 40 

12. The method of claim 9, wherein the at least one first 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 

13. The method of claim 9, wherein the at least one first 
storage device comprises multiple storage devices each 45 
multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system, 

14. The method of claim 9, wherein the first and second 
data each comprise respective partitions of a partitioned 
shared nothing database resident on the database processing 50 
system. 

15. The method of claim 9, wherein said providing, to the 
fourth processing node, access to the second data includes 
using a Virtual Shared Disk utility having a server portion on 
the second processing node and a first client portion on the 55 
fourth processing node. 

16. The method of claim 15, wherein said providing, to 
the third processing node, access to the first data includes 
using the Virtual Shared Disk utility having the server 
portion on the second processing node and a second client 60 
portion on the third processing node. 

17. The method of claim 10, wherein the third processing 
node and the fourth processing node are designated spare 
processing nodes in the database processing system. 

18. A method for recovering from a failure of a first 65 
processing node in a database processing system having a 
plurality of processing nodes, comprising: 



running a first database instance on the first processing 
node, the first processing node and a second processing 
node having commonly connected thereto at least one 
storage device for storing first data for the first database 
instance; 

detecting a failure of the first processing node; 

performing database recovery of the first database 
instance on the second processing node including 
accessing the first data on the at least one storage device 
through the second processing node; 

during said performing database recovery, attempting to 
restart the first processing node; and 

if said attempting results in a successful restart of the first 
processing node, thereafter restarting the first database 
instance on the first processing node including access- 
ing the first data on the at least one storage device 
through the first processing node, or 

if said attempting does not result in a successful restart of 
the first processing node, thereafter running the first 
database instance on the second processing node 
including accessing the first data on the at least one 
storage device through the second processing node, 
thereby recovering from the failure of the first process- 
ing node. 

19. The method of claim 18, wherein the at least one 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 

20. The method of claim 18, wherein the at least one 
storage device comprises multiple storage devices each 
multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system. 

21. The method of claim 18, wherein the first data 
comprises a partition of a partitioned shared nothing data- 
base resident on the database processing system. 

22. A method for recovering from a failure of a first 
processing node in a database processing system having a 
plurality of processing nodes, comprising: 

running a first database instance on the first processing 
node, the first processing node and a second processing 
node having commonly connected thereto at least one 
storage device for storing first data for the first database 
instance; 

detecting a failure of the first processing node; 
performing database recovery of the first database 
instance on the second processing node including 
accessing the first data on the at least one storage device 
through the second processing node; 
during said performing database recovery, attempting to 

restart the first processing node; and 
if said attempting results in a successful restart of the first 
processing node, thereafter restarting the first database 
instance on the first processing node including access- 
ing the first data on the at least one storage device 
through the first processing node, or 
if said attempting does not result in a successful restart of 
the first processing node, thereafter: 
providing, to a third processing node, access to the first 
data on the at least one storage device through the 
second processing node, and 
running the first database instance on the third process- 
ing node, including accessing the first data on the at 
least one storage device through the second process- 
ing node, thereby recovering from the failure of the 
first processing node. 

23. The method of claim 22, wherein the at least one 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 
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24. The method of claim 22, wherein the at least one 
storage device comprises multiple storage devices each 
multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system. 

25. The method of claim 22, wherein the first data 
comprises a partition of a partitioned shared nothing data- 
base resident on the database processing system, 

26. The method of claim 22, wherein said providing, to 
the third processing node, access to the first data includes 
using a Virtual Shared Disk utility having a server portion on 
the second processing node and a client portion on the third 
processing node. 

27. The method of claim 22, wherein the third processing 
node is a designated spare processing node in the database 
processing system. 

28. In a partitioned shared nothing database processing 
system, a method for recovering from a failure of a first 
processing node of a pair of processing nodes having 
twin-tailed-connected thereto at least one storage device, the 
first processing node having a first database instance running 
thereon and accessing a first data partition on the at least one 
storage device prior to the failure, the method comprising: 

providing, to a third processing node, access to the first 
data partition on the at least one storage device through 
a second processing node of the pair of processing 
nodes; and 

running, on the third processing node, a first replacement 
database instance for the first database instance which 
was running on the first processing node prior to the 
failure thereof, including accessing the first data parti- 
tion on the at least one storage device through the 
second processing node, thereby recovering from the 
failure of the first processing node. 

29. The method of claim 28, wherein said providing, to 
the third processing node, access to the first data partition 
includes using a Virtual Shared Disk utility having a server 
portion on the second processing node and a first client 
portion on the third processing node. 

30. The method of claim 29, wherein the second node had 
a second database instance running thereon and accessing a 
second data partition on the at least one storage device prior 
to the failure of the first processing node, the method further 
comprising: 

providing, to a fourth processing node, access to the 
second data partition on the at least one storage device 
through the second processing node; and 

running, on the fourth processing node, a second replace- 
ment database instance for the second database instance 
which was running on the second processing node prior 
to the failure of the first processing node, including 
accessing the second data partition on the at least one 
storage device through the second processing node. 

31. The method of claim 30, wherein said providing, to 
the fourth processing node, access to the second data par- 
tition includes using the Virtual Shared Disk utility having 
the server portion on the second processing node and a 
second client portion on the fourth processing node. 

32. A system for recovering from a failure of a first 
processing node in a database processing system having a 
plurality of processing nodes, comprising: 

means for running a first database instance on the first 
processing node, the first processing node and a second 
processing node having commonly connected thereto at 
least one first storage device for storing first data for the 
first database instance; 
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means for detecting a failure of the first p rocessin&j iode; 
means for providing, to a third processi ng node, access to 
me nrst data on tne at least one storage dev ice through 
The second processing node; and 
5 means for running the first database instance on the third 
processing node, including means for accessing the first 
data on the at least one first storage device through the 
second processing node, thereby recovering from the 
failure of the first processing node. 
10 33. The system of claim 32, wherein the at least one first 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 

34. The system of claim 32, wherein the at least one first 
storage device comprises multiple storage devices each 

15 multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system. 

35. The system of claim 32, wherein the first data com- 
prises a partition of a partitioned shared nothing database 
resident on the database processing system. 

20 36. The system of claim 32, wherein said means for 
providing, to the third processing node, access to the first 
data includes a Virtual Shared Disk utility having a server 
portion on the second processing node and a client portion 
on the third processing node. 
25 37. The system of claim 32, further comprising: 

means for running a second database instance on the 
second processing node prior to detecting the failure of 
the first processing node, the at least one first storage 
30 device having stored therein second data for the second 
database instance; 
means for providing, to a fourth processing node, access 
to the second data on the at least one first storage device 
through the second processing node after detecting the 
35 failure of the first processing node; and 

means for running the second database instance on the 
fourth processing node including means for accessing 
the second data on the at least one first storage device 
through the second processing node. 
40 38. The system of claim 37, wherein the at least one first 
storage device comprises two storage devices each twin- 
tailed-connected to the first and second processing nodes. 

39. The system of claim 37, wherein the at least one first 
storage device comprises multiple storage devices each 

45 multi-tailed-connected to the first, second and other process- 
ing nodes in the database processing system. 

40. The system of claim 37, wherein the first and second 
data each comprise respective partitions of a partitioned 
shared nothing database resident on the database processing 

50 system. 

41. The system of claim 37, wherein said means for 
providing, to the fourth processing node, access to the 
second data includes a Virtual Shared Disk utility having a 
server portion on the second processing node and a first 

55 client portion on the fourth processing node. 

42. The system of claim 41, wherein said means for 
providing, to the third processing node, access to the first 
data includes the Virtual Shared Disk utility having the 
server portion on the second processing node and a second 

60 client portion on the third processing node. 

43. In a partitioned shared nothing database processing 
system, a system for recovering from a failure of a first 
processing node of a pair of processing nodes having 
twin-tailed-connected thereto at least one storage device, the 

65 first processing node having a first database instance running 
thereon and accessing a first data partition on the at least one 
storage device prior to the failure, the system comprising: 
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means for providing, to a third processing node, access to 
the first data partition on the at least one storage device 
through a second processing node of the pair of pro- 
cessing nodes; and 

means for running, on the third processing node, a first 5 
replacement database instance for the first database 
instance which was running on the first processing node 
prior to the failure thereof, including means for access- 
ing the first data partition on the at least one storage 
device through the second processing node, thereby 1° 
recovering from the failure of the first processing node. 

44. The system of claim 43, wherein said means for 
providing, to the third processing node, access to the first 
data partition includes a Virtual Shared Disk utility having 

a server portion on the second processing node and a first 15 
client portion on the third processing node. 

45. The system of claim 44, wherein the second node had 
a second database instance running thereon and accessing a 
second data partition on the at least one storage device prior 

to the failure of the first processing node, the system further 20 
comprising: 

means for providing, to a fourth processing node, access 
to the second data partition on the at least one storage 
device through the second processing node; and 

means for running, on the fourth processing node, a 
second replacement database instance for the second 
database instance which was running on the second 
processing node prior to the failure of the first process- 
ing node, including means for accessing the second 
data partition on the at least one storage device through 
the second processing node. 

46. The system of claim 45, wherein said means for 
providing, to the fourth processing node, access to the 
second data partition includes the Virtual Shared Disk utility 
having the server portion on the second processing node and 
a second client portion on the fourth processing node. 

47. An article of manufacture comprising a computer 
usable medium having computer readable code means 
therein for recovering from a failure of a first processing 
node in a database processing system having a plurality of 
processing nodes, the computer readable program code 
means in said article of manufacture comprising: 

computer readable program code means for running a first 
database instance on the first processing node, the first 45 
processing node and a second processing node having 
commonly connected thereto at least one first storage 
device for storing first data for the first database 
instance; 

computer readable program code means for detecting a 50 
failure of the first processing node; 

computer readable program code means for providing, to 
a third processing node, access to the first data on the 
at least one storage device through the second process- 
ing node; and 55 

computer readable program code means for running the 
first database instance on the third processing node, 
including code means for accessing the first data on the 
at least one first storage device through the second 
processing node, thereby recovering from the failure of 60 
the first processing node. 

48. The article of manufacture of claim 47, wherein the 
first data comprises a partition of a partitioned shared 
nothing database resident on the database processing sys- 
tem. 65 

49. The article of manufacture of claim 47, wherein said 
code means for providing, to the third processing node, 
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access to the first data includes a Virtual Shared Disk utility 
having a server portion on the second processing node and 
a client portion on the third processing node. 

50. The article of manufacture of claim 47, further com- 
prising: 

computer readable program code means for running a 
second database instance on the second processing 
node prior to detecting the failure of the first processing 
node, the at least one first storage device having stored 
therein second data for the second database instance; 

computer readable program code means for providing, to 
a fourth processing node, access to the second data on 
the at least one first storage device through the second 
processing node after detecting the failure of the first 
processing node; and 

computer readable program code means for running the 
second database instance on the fourth processing node 
including code means for accessing the second data on 
the at least one first storage device through the second 
processing node. 

51. The article of manufacture of claim 50, wherein the 
first and second data each comprise respective partitions of 
a partitioned shared nothing database resident on the data- 
base processing system. 

52. The article of manufacture of claim 50, wherein said 
code means for providing, to the fourth processing node, 
access to the second data includes a Virtual Shared Disk 
utility having a server portion on the second processing node 
and a first client portion on the fourth processing node. 

53. The article of manufacture of claim 52, wherein said 
code means for providing, to the third processing node, 
access to the first data includes the Virtual Shared Disk 
utility having the server portion on the second processing 
node and a second client portion on the third processing 
node. 

54. An article of manufacture comprising a computer 
usable medium having computer readable program code 
jneanstherein for reco vering, in a partitioned shared nothing 
database sysiem, irifln a lailUWof aTusl pfUUe^uig n oHe of 
a pair of processing nodes having twin-Un connected 
th ereto at least one storage device^ the firsTprocessing node 
having a first database instance running thereon and access- 
ing a first data partition on the at least one storage device 
prior to the failure, the computer readable program code 
means in said article of manufacture comprising: 

computer readable program code means for providing, to 
a third processing node, access to the first data partition 
on the at least one storage device through a second 
processing node of the pair of processing nodes; and 

computer readable program code means for running, on 
the third processing node, a first replacement database 
instance for the first database instance which was 
running on the first processing node prior to the failure 
thereof, including code means for accessing the first 
data partition on the at least one storage device through 
the second processing node, thereby recovering from 
the failure of the first processing node. 

55. The article of manufacture of claim 54, wherein said 
code means for providing, to the third processing node, 
access to the first data partition includes a Virtual Shared 
Disk utility having a server portion on the second processing 
node and a first client portion on the third processing node. 

56. The article of manufacture of claim 55, wherein the 
second node had a second database instance running thereon 
and accessing a second data partition on the at least one 
storage device prior to the failure of the first processing 
node, the article of manufacture further comprising: 
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computer readable program code means for providing, to 
a fourth processing node, access to the second data 
partition on the at least one storage device through the 
second processing node; and 

computer readable program code means for running, on 
the fourth processing node, a second replacement data- 
base instance for the second database instance which 
was running on the second processing node prior to the 
failure of the first processing node, including code 
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means for accessing the second data partition on the at 
least one storage device through the second processing 
node. 

57. The article of manufacture of claim 56, wherein said 
code means for providing, to the fourth processing node, 
access to the second data partition includes the Virtual 
Shared Disk utility having the server portion on the second 
processing node and a second client portion on the fourth 
processing node. 

***** 
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